Fax: An Alternative to SGML

نویسندگان

  • Kenneth Ward Church
  • William A. Gale
  • Jonathan Helfman
  • David D. Lewis
چکیده

We have argued elsewhere (Church and Mercer, 1993) that text is more available than ever before, and that the availability of massive quantities of data has been responsible for much of the recent interest in text analysis. Ideally, we would hope that this data would be distributed in a convenient format such as SGML (Goldfarb, 1990), but in practice, we usually have to work with the data in whatever format it happens to be in, since we usually aren’t in much of a position to tell the data providers how to do their business. Recently, we have been working with a collection of 15,000 AT&T internal documents (500,000 pages or 100 million words). Unfortunately, this data is stored in a particularly inconvenient format: fax.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Another Look at L A TEX to SGML Conversion

Publishers are starting to use SGML as their permanent form of storage for documents. How can LTEX files be converted to an SGML instance? This paper discusses possible strategies, and describes an implementation by Elsevier Science of a system based on conversion in TEX itself, and extraction of SGML code from the dvi file.

متن کامل

SGML and XML as interchange formats for HL7 messages

OBJECTIVE To report on the use of SGML and XML (a proper subset of SGML) as transfer syntaxes for HL7 Version 2.3 and Version 3.0 messages. METHODS The methodology has focused largely on two questions: Can it be done? How best to do it? The first question is addressed by attempting to build an SGML/XML representation of HL7 messages. The second question requires a consideration of several met...

متن کامل

On the Interchangeability of SGML and ODA

SGML and ODA are international standards for the markup and interchange of electronic documents. These standards are incompatible, in the sense that in general a document encoded using SGML cannot be used directly in an ODA-based system, and vice versa. We first describe these two standards, and suggest criteria under which a bridge between the two standards could be evaluated. We then evaluate...

متن کامل

SGML - Lite { An SGML - based Programming Environment

Literate Programming is a documentation method that attempts to maintain consistency among the various design and program documents of a software system. Unfortunately the majority of the literate programming tools do not have appropriate user interfaces and require the users to learn complicated and cryptic tagging languages. SGML is a metalanguage used to specify markup or tagging languages t...

متن کامل

Demand More from Your Sgml Database! Bringing Sql under the Sgml Limelight

Have you ever been frustrated by how inadequate SGML databases are in terms of searching or querying your documents? With the current state of the art, you will easily be able to search for a word, phrase, or keywords in the whole document. Some systems let you perform approximate searches or regular expression searches. Even fewer systems let you search for keywords or phrases in certain SGML ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994